42 research outputs found

    Simplified Two-level Morphophonology

    Get PDF
    Peer reviewe

    Two-Level Morphology : A General Computational Model for Word-Form Recognition and Production

    Get PDF
    This dissertation presents a new computationally implemented linguistic model for morphological analysis and synthesis. The model incorporates a general formalism for making morphological descriptions of particular languages, and a language-independent program implementing the model. The two-level formalism and the structure of the program are formally defined. The program can utilize descriptions of various languages, including highly inflected ones such as Finnish, Russian, or Sanskrit. The new model is unrestricted in scope and it is capable of handling the entire language system as well as ordinary running text. A full description of Finnish inflectional morphology is presented in order to validate the model. The two-level model is based on a lexicon system and a set of two-level rules. It differs from generative phonology in the following respects. The rules are parallel, as opposed to being sequentially ordered, as is the case with the rewriting rules of generative phonology. The two-level model is fully bidirectional both conceptually and processually. It can also be interpreted as a morphological model of the performance processes of word-form recognition and production. The model and the descriptions are based on computationally simple machinery, mostly on small finite state automata. The computational complexity of the model is discussed, and the description of Finnish is evaluated with respect to external evidence from child language acquisition.Peer reviewe

    Syntactic Methods in the Study of the Indus Script

    Get PDF
    no abstrac

    Guessing lexicon entries using finite-state methods

    Get PDF
    A practical method for interactive guessing of LEXC lexicon entries is presented. The method is based on describing groups of similarly inflected words using regular expressions. The patterns are compiled into a finite-state transducer (FST) which maps any word form into the possible LEXC lexicon entries which could generate it. The same FST can be used (1) for converting conventional headword lists into LEXC entries, (2) for interactive guessing of entries, (3) for corpus-assisted interactive guessing and (4) guessing entries from corpora. A method of representing affixes as a table is presented as well how the tables can be converted into LEXC format for several different purposes including morphological analysis and entry guessing. The method has been implemented using the HFST finite-state transducer tools and its Python embedding plus a number of small Python scripts for conversions. The method is tested with a near complete implementation of Finnish verbs. An experiment of generating Finnish verb entries out of corpus data is also described as well as a creation of a full-scale analyzer for Finnish verbs using the conversion patterns

    Common Infrastructure for Finite-State Based Methods and Linguistics Descriptions

    Get PDF
    Finite-state methods have been adopted widely in computational morphology and related linguistic applications. To enable efficient development of finite-state based linguistic descriptions, these methods should be a freely available resource for academic language research and the language technology industry. The following needs can be identified: (i) a registry that maps the existing approaches, implementations and descriptions, (ii) managing the incompatibilities of the existing tools, (iii) increasing synergy and complementary functionality of the tools, (iv) persistent availability of the tools used to manipulate the archived descriptions, (v) an archive for free finite-state based tools and linguistic descriptions. Addressing these challenges contributes to building a common research infrastructure for advanced language technology.Peer reviewe

    Nordic co-operation in building the language resource infrastructures

    Get PDF
    Proceedings of the NODALIDA 2009 workshop Nordic Perspectives on the CLARIN Infrastructure of Language Resources. Editors: Rickard Domeij, Kimmo Koskenniemi, Steven Krauwer, Bente Maegaard, Eiríkur Rögnvaldsson and Koenraad de Smedt. NEALT Proceedings Series, Vol. 5 (2009), 12-15. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9207

    Corpus of Texts in the Indus Script

    Get PDF
    Peer reviewe

    Representing Calendar Expressions with Finite-State Transducers that Bracket Periods of Time on a Hierachical Timeline

    Get PDF
    Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit. University of Tartu, Tartu, 2007. ISBN 978-9985-4-0513-0 (online) ISBN 978-9985-4-0514-7 (CD-ROM) pp. 355-362

    Johdatus kieliteknologiaan, sen merkitykseen ja sovelluksiin

    Get PDF

    Indexing Old Literary Finnish text

    Get PDF
    Peer reviewe
    corecore